Text generating method and text generator

ABSTRACT

The present invention provides method and apparatus for generating a natural text from at least one keyword. The keyword is input by a keyword input unit, and a text and phrase searching and extracting unit extracts any text or phrase containing keywords, if any. A text generation unit morphologically analyzes and parses the extracted text, and outputs a natural text by combining the text with the keyword.

TECHNICAL FIELD

The present invention relates to a method and apparatus for naturallanguage processing. In particular, the present invention ischaracterized by a technique for generating a text from severalkeywords.

BACKGROUND ART

The development of techniques for parsing or generating a text of alanguage with a computer has been well in advance. To generate a text asnatural as possible is one of the primary concerns in text generation. Arequirement for the generating method is to generate a text that looksalmost the same as the one generated by humans.

With several keywords input, a technique to generate a natural lookingtext using these keywords may help ones such as foreigners who are notfamiliar with sentence construction.

Since simply naming words in sequence conveys an intention to anotherperson, the technique may be used in a similar way as a machinetranslation is used.

For example, text generation techniques may be expected to assistaphasic. Currently, a total of 100,000 persons suffer from aphasia inJapan. It is said that about 80 percent of the aphasics are able tovocalize a sentence in a broken manner (namely, a sequence of words), orare able to select several words to make themselves understood ifseveral word candidates are presented.

For example, a sequence of words “kanojo (she)/kouen (park)/itta (went)”is spoken or selected, and then, a more natural sentence “kanojo wakouen e itta. (She went to a park)” or “kanojo to kouen e itta. (I wentto a park with her.)” may be generated and presented. The technique thushelps a person communicate with an aphasic patient.

Already available techniques for generating a natural text in responseto the input of at least one keyword include a technique for generatinga sentence using a template, and a technique for searching a databasefor a sentence in response to the keyword.

These techniques are effective only when the keyword matches a template,or only when the keyword matches a sentence in the database. In anycase, the types of sentence generated are limited.

Another technique has been proposed in which a keyword is replaced witha synonym to increase a hit rate in searching. Since variations to begenerated from a keyword become wide, the technique is not sufficient.

DISCLOSURE OF THE INVENTION

The present invention has been developed in view of the aforementionedbackground, and provides a generating method for generating a naturaltext from at least one keyword.

More specifically, the present invention generates a text based on eachof the following steps.

In an input step for inputting at least one word serving as keyword,words “kanojo (she)”, “kouen (park)”, and “itta (went)” are input.

The process then proceeds to an extracting step for extracting, from adatabase, a text or a phrase related to the keyword. The databasecontains a number of sample sentences, and for example, texts andphrases containing the word “kanojo” are searched and extracted.

By combining the extracted text or phrase, an optimum text using theinput keyword is generated. If a text containing “kanojo”, “e”, and“itta” is present in the database in this text generation step, acombination results in a text “kanjojo wa kouen e itta”.

Texts only may be extracted in the extracting step, and the extractedtext may be morphologically analyzed and parsed to acquire a dependencystructure of the text. By forming a dependency structure containing thekeyword, a more natural text is generated.

In the course of forming the dependency structure containing thekeyword, a dependency probability of the entire text is determined usinga dependency model. A text having a maximum probability is generated asan optimum text.

In accordance with the present invention, a text having a natural wordorder may be generated using a word order model. In the text generationstep, the word order model may be used in the middle of or prior to thegeneration of the dependency structure in the text generation step.

It is determined in the text generation step based on a learning modelwhether there is a word to be inserted between any two keywords in allarrangements of the keywords. Word insertion is performed starting witha word having the highest probability. A word insertion process startswith a word having the highest probability in the learning model. Theword insertion process is repeated until a probability that there is noword to be inserted between any keywords becomes the;highest. Since theinserted word is included as a keyword, a further word insertion may beperformed between the inserted words. An optimum word insertion is thusperformed. A natural text is generated even when the number of givenkeywords is small.

In accordance with the present invention, the database may contain atext having a characteristic text pattern, and a text accounting for thecharacteristic text pattern may be generated in the text generationstep.

For example, the database may contain texts characteristic of writingstyles and expressing, and a text generated becomes compliant with thecharacteristic writing styles and expression.

The present invention provides a text generation apparatus forgenerating a text of a sentence. The text generation apparatus includesinput means for inputting at least one word as a keyword, extractingmeans for extracting, from a database containing a plurality of texts, atext or a phrase related to the keyword, and text generation means forgenerating an optimum text based on the input keyword by combining theextracted text or phrase.

In an arrangement where the text extracting means extracts the text, thetext generation means may include parser means for morphologicallyanalyzing and parsing the extracted text, and acquiring a dependencystructure of the text, and dependency structure generation means forgenerating a dependency structure containing the keyword.

In the text generation means, the dependency structure generation meansmay determine the probability of dependency of the entire text using adependency model, and generates a text having a maximum probability asan optimum text.

In the middle of or prior to the generation of the dependency structure,the text generation means may generate an optimum text having a naturalword order based on a word order model.

The text generation means may include word insertion means thatdetermines, using a learning model, whether there is a word to beinserted between any two keywords in all arrangements of the keywords,and performs a word insertion process starting with a word having thehighest probability, wherein the word insertion means repeats the wordinsertion until a probability that there is no word to be insertedbetween any keywords becomes the highest.

In the text generation apparatus, as already discussed, the databasecontains a text having a characteristic text pattern, and a text incompliance with the characteristic text pattern is generated.

With pattern selecting means provided, the text generation apparatus mayappropriately select and switch a plurality of text patterns.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a text generation apparatus in accordance with thepresent invention.

FIG. 2 is a subgraph illustrating a dependency structure analyzed by atext generation unit.

FIG. 3 is a dependency tree generated by the text generation unit.

FIG. 4 is a dependency tree in another sample sentence.

FIG. 5 illustrates an example of calculation of a probability that anorder of word dependency is appropriate.

Reference numerals are designated as follows: 1: text generationapparatus, 2: keyword to be input, 3: output text, 10: keyword inputunit, 11: text and phrase searching and extracting unit, 12: textgeneration unit, 12 a: parser, 12 b: constructor, 12 c: evaluator, and13: database

BEST MODE FOR CARRYING OUT THE INVENTION

The embodiments of the present invention will now be discussed withreference to the drawings. The present invention is not limited to thefollowing embodiments and may be appropriately modified.

FIG. 1 illustrates a text generation apparatus (1) in accordance withthe present invention. The text generation apparatus (1) includes akeyword input unit (10), a text and phrase searching and extracting unit(11), a text generation unit (12), and a database (13). The database(13) contains beforehand a plurality of texts in a table, and thecontent of the table may be modified as necessary. By modifying thecontent, a variety of texts may be produced as will be discussed later.

If the keyword input unit (10) inputs three keywords (2) of “kanojo”,“kouen”, and “itta”, the text and phrase searching and extracting unit(11) searches and extracts a text or a phrase, each containing at leastone of the keywords from the database (13).

Based on the extracted text or phrase, the text generation unit (12)combines these, thereby outputting a natural text (3) “kanojo wa kouen eitta.”

This process will be discussed in more detail. In response to thekeyword input by the keyword input unit (10), the text and phrasesearching and extracting unit (11) extracts a sentence having n keywordsfrom the database (13). It is perfectly acceptable if one keyword iscontained in the sentence. The extracted sentence is then sent to thetext generation unit (12).

The text generation unit (12) includes the parser (12 a), theconstructor (12 b), and the evaluator (12 c). The parser (12 a)morphologically analyzes and parses the extracted sentence.

Available as a morphological analyzing method is a method of analyzing amorpheme based on an ME model, as disclosed in Japanese PatentApplication No. 2001-139563 applied by the applicant of thisapplication.

A likelihood as a morpheme is expressed by probability in theapplication of morphological analysis to a ME model.

More specifically, given a sentence, a morphological analysis of thatsentence is interpreted as assigning one of two identification codes,namely, “1” or “0” indicating whether the character string is amorpheme, to the character string.

If the character string is a morpheme, “1” is divided by the number ofsyntactic attributes to impart syntactic attributes. If the number ofsyntactic attributes is n, an identification code of “0” to “n” isassigned to each character string.

In a technique using an ME model in morphological analysis, a likelihoodthat a character string is a morpheme and has any syntactic attribute isapplied to a function of probability distribution in the ME model. Inthe morphological analysis, regularity is found in the probabilityrepresenting the likelihood.

Features in use include information representing the character type of acharacter string of interest, whether that character string isregistered in a dictionary, a change in character type from animmediately preceding morpheme, and part of speech of the immediatelypreceding morpheme. If a single sentence is given, the sentence isdivided into morphemes so that the product of probabilities ismaximized, and syntactic attributes are imparted to the morphemes. Anyknown algorithm may be used to search for an optimum solution.

The morphological analysis method using the ME model provides excellentperformance, for example, performs an effective morphological analysiseven if a sentence contains an unknown word. In the embodiments of thepresent invention, the above method is particularly effective. Thepresent invention is not limited to the above method. Any morphologicalanalysis method may be used.

A parsing method using an ME model may be used as a parsing method ofthe parser (12 a). Any other parsing method may be used. The followingmethod is used in one embodiment. The text generation unit (12) mayreferences the database (13), and learns a plurality of texts containedthe database (13) in the ME model.

The dependency analysis out of the parsing analysis is introduced. Thedependency relation in Japanese language regarding which word modifieswhich word is said to have the following characteristics.

-   -   (1) The dependency relation is one direction from left to right        in a sentence.    -   (2) The dependency relation does not cross. (Hereinafter, this        characteristic is referred to non-crossing condition).    -   (3) A modifying segment has only one modified segment.    -   (4) In many cases, the determination of a modification target        requires no preceding context.

With view to these characteristics, one embodiment of the presentinvention achieves a high analysis precision by combining a statisticaltechnique and a method of analyzing a sentence from the end of thesentence to the head of the sentence.

Two phrases at a time are successively picked up from the end of thesentence, and whether or not the two phrases are in a dependencyrelation is statistically determined. In such a case, information ineach phrase or information between the phrases are utilized as afeature, and which feature to use determines the precision.

The phrase is divided into a front portion as a headword, and a backportion as a postposition or a conjugation. Together with the feature ofeach portion, a distance between the phrases and the presence or absenceof a punctuation are taken into consideration as features.

Furthermore considered are the presence or absence of parentheses, thepresence or absence of a postposition “wa”, whether or not the samepostposition or the same conjugation as a modifying phrase is presentbetween phrase, and a combination of features.

The ME model handles a variety of these features.

This method achieves a precision as high as a known method using adecision tree or a method of maximum likelihood estimation althoughlearning data is in size as much as one-tenth the size of the data ofthe known technique. This technique achieves the highest standard ofprecision as a system based on learning.

In the known art, a feature effective to predict whether two phrases arein a dependency relation is learned from information obtained fromlearning data. A more precise dependency analysis is performed bylearning information effective to predict whether a preceding phrase isin any of three states of “modifying a phrase coming beyond a subsequentphrase”, “modifying the subsequent phrase”, and “modifying a phraseprior to the subsequent phrase”.

The use of the morphological analysis method and parsing method, basedon the ME model, allows the parser (12 a) to precisely analyze a textsearched and extracted from the database (13), and acquire a dependencystructure of the text. The dependency structure is represented in asubgraph. In the subgraph, each node represents a phrase, and each arcrepresents a dependency.

All subgraphs containing at least one keyword are extracted, and thefrequency of occurrence of each subgraph is examined. The node isconsidered to have generalized information (proper noun such as personalname or systematic name, or part of speech).

Subgraphs are extracted from the database (13) according to the abovekeywords and are analyzed. FIGS. 2 a and 2 b illustrate the subgraphshaving high frequencies of occurrence. Referring to FIG. 2 a, thekeyword (kanojo wa) is a node (parent node 1) (20), and “<noun>+e” is anode (parent node 2) (21), and “<verb>.” is a node (child node) (22),and a dependency relation (23) results.

A subsequent process may be a process performed by the constructor (12b) of the text generation unit (12). However, in accordance with thisembodiment, the analysis and generation performed in the text generationunit (12) is an integral process and are performed in cooperation.

It is assumed that n input keywords are in a dependency relation, and adependency structure tree containing the n input keywords is generated.To generate the tree, the subgraphs are combined.

For example, the three keywords are input, and it is assumed that thethree keywords are in a dependency relation, and the subgraphs arecombined (in this case, aligned). Trees shown in FIGS. 3 a and 3 b thusresult.

The above-referenced dependency model is again used to select which ofthe two generated trees (FIGS. 3 a and 3 b) as appropriate.

For ordering, the ratio of agreement between a combination of subgraphs,the frequency of occurrence, and the dependency relation are taken intoconsideration. If n is three or more, an ambiguity is present in thedependency relation between the n words. To solve the ambiguity, adependency model is used. A word having a larger probability determinedfrom the dependency model is ordered with higher priority.

As a result, the probability of the tree of FIG. 3 a is higher, and thetree of FIG. 3 a is selected as the optimum dependency relation.

In Japanese language, the limitation in word order is relatively mild,and if the dependency relation is determined, a result close to anatural text is obtained. The languages the present invention intends tocover are not limited to Japanese language. The present invention isapplicable to other languages.

To contribute to the output of a more natural text in Japanese language,the most natural word order is preferably selected. In accordance withthe present invention, the following re-arrangement of word order ispossible.

From the tree having the higher priority, a sentence is re-arranged inthe natural word order and is output. Used to this end is a word ordermodel based on the ME model that generates a natural order sentence froma dependency structure. The database (13) may be referenced to learn theword order model.

In Japanese language that is said to free in word order, linguisticresearches performed so far show a word order tendency, for example, aadverb representing time tends to appear before a subject, and a longmodification phrase tends to appear in a front side of a sentence. Ifsuch tendencies are patterned in order, such a pattern becomesinformation effective in the generation of natural sentences. The wordorder here refers to the one in terms of mutual dependency, namely, theword order with respect to the same phrase. Various factors determineword order. For example, a long modification phrase tends to appearfrontward than a short modification phrase. A phrase containing acontext pointing word such as “sore (that)” tends to appear frontward.

The embodiment of the present invention provides a technique to learn arelationship between elements in a sentence and the tendency of wordorder, namely, a regularity from a predetermined text. This techniquelearns the word order by referring to what element contributes to thedetermination of word order in what degree but also what combination ofthe elements results what tendency of the word order. This techniquethus deductively learns a text. The degree of contribution of eachelement is efficiently learned using the ME model. The word order islearned by sampling two phrases at a time regardless of the number ofmodified phrases.

To generate a sentence, the learned model is used. With the phrases independency relation received, the order of the dependency phrases aredetermined. The decision of the word order is performed as below.

All possible arrangements of the dependency phrases are considered. Theprobability of appropriateness of the order of the dependency phrases isdetermined based on the learned model with respect to each of thearrangements. The probability is then replaced with “0” or “1”respectively representing appropriateness or inappropriateness, and isthen applied to the function of the probability distribution of the MEmodel.

The arrangement presenting the maximum overall probability is consideredas a solution. Two dependency phrases are successively sampled, and theprobability of the order of the two phrases is calculated. The overallprobability is calculated as a product of these probabilities.

For example, an optimum word order is now determined in a sentence“kinou (yesterday)/tenisu wo (tennis)/Taro wa (personal name)/shita(played).” In the same way as already discussed, a dependency tree isproduced. A structure tree having the highest probability is obtained asshown in FIG. 4.

More specifically, words modifying verb “shita.” (43) include threenamely, “kinou” (40), “tenisu wo” (41), and “Taro wa” (42). The order ofthe three words are determined.

FIG. 5 illustrates a calculation example (50) of a probability that theorder of the dependency phrases is appropriate.

Three combinations of two phrases, namely, “kinou” and “Taro wa”, and“kinou” and “tenisu wo”, and “Taro wa” and “tenisu wo” are sampled. Theprobability that the word is appropriate is determined based on alearned regularity.

For example, the probability of the word order of “kinou” and “Taro wa”in the chart is “p*(kinou, Taro wa)”, and is assumed to be 0.6.Similarly, the word order of “kinou” and “tenisu wo” is 0.8, and theword order of “Taro wa” and “tenisu wo” is 0.7, and the probability ofthe word order (51) at a first row in FIG. 5 is determined bymultiplying the probabilities, and is thus 0.336.

The overall probability is calculated in each of all possibilities ofthe 6 word orders (51 through 56), and the word order “kinou/Tarowa/tenisu wo/shita.” (51) having the highest probability is determinedas being an optimum word order.

Similarly, in the preceding text “kanojo wa/kouen e/itta.”,probabilities of a smaller number of combinations is calculated, and theword order “kanjo wa kouen e itta.” is determined as an optimum text.

If a generalized node is contained in the word order model, the node ispresented as is, and a location where a personal name, a geographicname, or a date is easy to place is known.

The dependency structure is received in the word order model in theabove-referenced word order model. In accordance with the embodiment ofthe present invention, a word order model is used in a building processof the dependency structure.

As described above, the constructor (12 b) in the text generation unit(12) generates a plurality of text candidates considered as beingoptimum using the dependency model and the word order model. Inaccordance with the present invention, these candidates may be directoutput from the text generation apparatus (1). However, in thediscussion that follows, the text generation unit (12) includes theevaluator (12 c), and the text candidates are evaluated for re-ordering.

The evaluator (12 c) evaluates the text candidates by putting togethervarious information including the order of the input keywords, thefrequency of occurrence of the extracted pattern, and a score calculatedfrom the dependency model and the word order model. The evaluator (12 c)may reference the database (13).

For example, a keyword having a high order is considered as an importantkeyword, and a text candidate in which the keyword plays a particularlyimportant role is evaluated as an optimum text. In the above discussion,the probability is determined separately on a per model basis, such aseach of the dependency model and the word order model. Putting togetherthese results, a comprehensive assessment may be performed.

With the evaluator (12 c) functioning, a plurality of texts consideredparticularly optimum are ordered with rank from among the candidatesformed as the natural texts.

The text generation apparatus (1) of the present invention may beincorporated into another language processing system, and may provide aplurality of outputs or a single output having the highest rank.

The text generation apparatus (1) may output texts having a rank higherthan a predetermined value, or texts higher than a threshold inprobability or score, and the outputs may be then manually selected.

The text generation unit (12) receives the candidates built by theconstructor (12 b) only. Furthermore, the evaluator (12 c) may selectthe text candidates evaluating an entire sentence containing a pluralityof texts, or evaluates the text candidates in the entire sentence as awhole, thereby deciding a single text candidate.

If a small number of phrases in an entire sentence is unnatural in theconsistency between a prior phrase and a subsequent phrase, the resultsare returned back to the process of the parser (12 a) or the constructor(12 b) so that another candidate is built to output a natural text inthe entire sentence.

The text (3) “kanojo wa kouen e itta.” generated in an optimum syntaxand word order by the text generation unit (12) is output from the textgeneration apparatus (1). One text (3) considered the most natural ishere output.

In accordance with the present invention, a natural text is generatedand output in the arrangement, different from the known art, byinputting at least one keyword (2) and by referencing the database (13).

The present invention provides an insertion method that is performedwhen keywords are not sufficient.

If n keywords are input, inter-word space is filled using the ME model.Two keywords out of n keywords are input to the model, and the insertionprocess is performed between the two keywords.

A determination is made of whether there is a word to be insertedbetween any two keywords. If there are a plurality of words to beinserted between the two keywords, the probability of occurrence of eachof the words is determined. An insertion operation is performed startingwith a word having the highest probability. This process is performedfor each of any two words.

The insertion operation is terminated when the probability of “noinsertion” becomes highest between any two keywords.

Even when sufficient keywords are not provided, keywords are compensatedfor to some degree using the ME model in the insertion process. When anatural text cannot be generated in response to the input keywords, aneffective text may be output.

The insertion process may be performed in the text generation of thetext generation unit.

For example, when “kanojo”, “kouen”, and “itta.” are provided asdescribed above, “wa”, “ga”, “to”, etc. may occur between “kanojo” and“kouen”, and “wa” having the highest probability of occurrence isinserted therebetween.

Similarly, “wa”, “ga”, “to”, etc. may occur between the “kanojo” and“kouen”, and “wa” having the highest probability of occurrence isinserted therebetween. Similarly, “e”, “ni”, etc. may occur between“kouen” and “itta.”, and “e” having the highest probability is insertedtherebetween.

By repeating the insertion, the probabilities of the insertions in allsentences are calculated, and the product of all probabilities iscalculated. A combination of insertions in the entire sentence providingthe highest probability is adopted, and the text is generated. In thiscase, “kanojo wa kouen e itta.” is obtained, which is the same result asthe aforementioned method of the present invention.

Based on the aforementioned text generation method, the presentinvention inserts keywords and generates a text using the insertionmethod.

The text generation method of the present invention is particularlyappropriate for use in the following applications.

The text generation method finds applications in assisting aphasic inthe generation of sentences. A natural sentence is generated from abroken sentence (a string of words), such as “kanojo kouen itta.” andsentence candidates “kanojo ga kouen e itta.”, “kanojo to kouen eitta.”, etc. are output. The patient conveys a content he wants toexpress by simply approving a presented text. The chance ofcommunication of the patient is thus increased.

In the case of lack of keywords, the insertion technique is used, aplurality of texts are presented, and the patient simply selects onefrom the texts. Such an application is sufficiently advantageous.

Incorporating an apparatus that interactively converses with the humanbeing helps communication therebetween. More specifically, keywords areappropriately extracted from a sentence the human being voices, and anew sentence is generated, and voiced. If typical information such as5Ws and 1H information is missing when a sentence is generated, thegeneration of another sentence for questioning the missing informationmay be contemplated.

A system having a similar arrangement may generate a natural sentence byrecognizing voice, and ask a question. Human beings do not always heardistinctly a conversation, but understand the conversation byinterpolating what they fail to distinctly hear. A sentence is generatedbased on a recognized portion of the conversation, and a question isasked. Since it is expected that a mistakenly recognized portion may beemphatically voiced in a corrected form, a correct sentence may begenerated by exchanging sentences several times.

A combination of insertion techniques may provide another system thatautomatically creates a new story. For example, when “ojiisan (an oldman), obasan (an old woman), yama (hill), and kame (turtle)” are input,Japanese folk stories of Momo Taro and Urashima Taro may be contained ina database and a new story different from the folk stories may becreated. Newly inserted keywords may include “kawa (river), momo(peach), and ryugujo (the Sea God's Palace)”.

The more the stories in the database, the more unexpected a resultingstory becomes, and the reader finds the story difficult to associatewith source stories.

A sentence and keywords within the sentence may be input, and a sentencecontaining the keywords and having an appropriate length may begenerated. A composition writing system is thus provided. An outputsentence, shorter than the original one, may be a summary. It is alsocontemplated that a detailed sentence is generated by adding typicalinformation to the output sentence. The system, different from the knownsystem, generates a sentence from the important keywords in aself-contained manner, thereby providing a more natural summary.

A sentence with a lot of redundancy, possibly written by a unskilledwriter, may be corrected, and may be changed into a smoother sentencewith phrases added.

The technique of the present invention may be used to convert the styleof sentence. Keywords are extracted from the sentence, and a sentence isre-generated based on the keywords. Based on a database, the resultingsentence has an expression unique to the database. For example, with anovel of a certain writer used as a database, a re-written sentence mayhave a style of that writer.

The text generation method may be used in assisting in input of asentence on mobile terminals that are currently in widespread use. Aneasy to read sentence may be produced on a mobile terminal a user hasdifficulty in inputting a sentence. For example, when several words areinput to the terminal, sentence candidates are presented. The userselects one from the sentence candidates, thereby generating a sentenceas good as one manually generated. The user simply inputs words, and isthus free from an operation to compose a sentence in detail.

If a database stores mails actually written by the user, the usercomposes sentences matching the user's own style during mail writing.

In accordance with the present invention, a variety of text patternssuch as styles and expressions are stored in the database, and a textthat accounts for the text patterns is automatically generated. A textreflecting personality is easily generated.

The database stores a text containing a plurality of characteristic textpatterns, and a plurality of databases are arranged. The user designatesa text pattern or switches the database, thereby generating a texthaving any text pattern.

By inputting keywords from itemized memos, a draft of a lecture at ameeting may be written or an article may be written. By inputting theresume of a person, a letter of introduction of the person may bewritten.

The present invention constructed as previously discussed provides thefollowing advantages.

Several words are input in the input step, and a text or a phrase isextracted from the database in the extracting step. Extracted texts orphrases are combined to generate an optimum text containing the inputkeyword.

The extracted text is morphologically analyzed and parsed to obtain adependency structure of the text. A more natural and precise textgeneration is thus achieved.

In the course of forming the dependency structure containing thekeyword, the dependency probability of the entire text is determinedusing the dependency model. The text having the highest probability isgenerated as the optimum text. Thus, even more natural text generationis achieved.

In connection with word order that has conventionally been difficult toaddress, a text with a natural word order is generated using the wordorder model.

A determination is made in the text generation step whether there is aword to be inserted between any two keywords in all arrangements of thekeywords using a learning model. The word insertion is performedstarting with the word having the highest probability in the learningmodel. The word insertion is repeated until the probability that no wordto be inserted is present between any two keywords becomes the highest.An optimum insertion is thus achieved. Even with a small number ofkeywords, a natural text is generated.

In the text generation method of the present invention, the databasestores a text having characteristic text patterns. A text reflectingsuch characteristic text patterns is thus generated. A natural text thereader comfortably reads is thus provided.

In accordance with the present invention, the text generation apparatusperforming the above-referenced text generation method is provided, andcontributes to an advance of natural language processing techniques.

1. A text generation method for generating a text including a sentence,comprising: an input step for inputting at least a word as a keywordthrough input means, an extracting step for extracting, from a database,a text or a phrase related to the keyword through extracting means, anda text generation step for generating an optimum text based on the inputkeyword by combining the text or the phrase extracted by text generationmeans.
 2. A text generation method according to claim 1, wherein in anarrangement where the text is extracted in the extracting step, parsermeans morphologically analyzes and parses the extracted text in the textgeneration step, and acquires a dependency structure of the text, andwherein dependency structure generation means generates a dependencystructure containing the keyword.
 3. A text generation method accordingto claim 2, wherein in the course of generating the dependency structurecontaining the keyword in the text generation step, the dependencystructure generation means determines the probability of dependency ofthe entire text using a dependency model, and wherein the textgeneration means generates a text having a maximum probability as anoptimum text.
 4. A text generation method according to claim 2 or 3,wherein in the middle of or after the generation of the dependencystructure in the text generation step, the text generation meansgenerates an optimum text having a natural word order based on a wordorder model.
 5. A text generation method according to claim 1, whereinin the text generation step, word inserting means determines, using alearning model, whether there is a word to be inserted between any twokeywords in all arrangements of the keywords, and performs a wordinsertion process starting with a word having the highest probability inthe learning model, wherein the word insertion means performs the wordinsertion process by including, as a keyword, a word to be inserted, orthen removing the word as the keyword, and by repeating the cycle ofword inclusion and removal until a probability that there is no word tobe inserted between any keywords becomes the highest.
 6. A textgeneration method according to claim 1, wherein in an arrangement wherethe database contains a text having a characteristic text pattern, thetext generation means generates a text in compliance with thecharacteristic text pattern.
 7. A text generation apparatus forgenerating a text of a sentence, comprising: input means for inputtingat least one word as a keyword, extracting means for extracting, from adatabase containing a plurality of texts, a text or a phrase related tothe keyword, and text generation means for generating an optimum textbased on the input keyword by combining the extracted text or phrase. 8.A text generation apparatus according to claim 7, wherein in anarrangement where the text extracting means extracts the text, the textgeneration means comprises parser means for morphologically analyzingand parsing the extracted text, and acquiring a dependency structure ofthe text, and dependency structure generation means for generating adependency structure containing the keyword.
 9. A text generationapparatus according to claim 8, wherein in the text generation means,the dependency structure generation means determines the probability ofdependency of the entire text using a dependency model, and generates atext having a maximum probability as an optimum text.
 10. A textgeneration apparatus according to claim 8 or 9, wherein in the middle ofor prior to the generation of the dependency structure, the textgeneration means generates an optimum text having a natural word orderbased on a word order model.
 11. A text generation apparatus accordingto claim 7, wherein the text generation means comprises word insertionmeans that determines, using a learning model, whether there is a wordto be inserted between any two keywords in all arrangements of thekeywords, and performs a word insertion process starting with a wordhaving the highest probability in the learning model, wherein the wordinsertion means performs the word insertion process by including, as akeyword, a word to be inserted, or then removing the word as thekeyword, and by repeating the cycle of word inclusion and removal untila probability that there is no word to be inserted between any keywordsbecomes the highest.
 12. A text generation apparatus according to claim7, wherein in an arrangement where the database contains a text having acharacteristic text pattern, the text generation means generates a textin compliance with the characteristic text pattern.
 13. A textgeneration apparatus according to claim 12, comprising pattern selectingmeans that contains one or a plurality of databases containing textshaving a plurality of characteristic text patterns, and selects adesired text pattern from the plurality of text patterns.